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Abstract 

A chemical kinetic model of the elongation dynamics of RNA polymerase along a DNA sequence 
is introduced. The proposed model governs the discrete movement of the RNA polymerase along 
a DNA template, with no consideration given to elastic effects. The model's novel concept is a 
"look-ahead" feature, in which nucleotides bind reversibly to the DNA prior to being incorporated 
covalently into the nascent RNA chain. Results are presented for specific DNA sequences that 
have been used in single-molecule experiments of RNA polymerase along DNA. By replicating the 
data analysis algorithm from the experimental procedure, the model produces velocity histograms, 
enabling direct comparison with these published experimental results. 
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RNA polymerase is the key enzyme of transcription, the step at which most regulation of 
gene expression occurs. Transcription consists of three distinct processes: initiation, elon- 
gation and termination. Of these processes, elongation has been the least studied because 
conventional experimental biological techniques have been unable to investigate the dy- 
namical properties of RNA polymerase during transcriptional elongation. Fortunately this 
situation has changed with the advent and extensive use of single molecule force microscopy 
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From a modeling perspective, elongation is the process in transcription most amenable to 
a quantitative description. RNA polymerase elongation dynamics can be seen as a stochastic 
process, more specifically as a one-dimensional random walk of the polymerase molecule 
along the DNA. A stochastic model for the motion of RNA polymerase was proposed by 
Julicher and Bruinsma 0]. Others, most notably Wang et al 0], have studied force 
generation of RNA polymerase during elongation by approximating the internal strain of 
RNA polymerase using the concept of springs. 

In this letter, then, we introduce a formal chemical kinetic model for the dynamics of the 
movement of RNA polymerase along DNA. In our model we make no reference to continuous 
motions and focus instead on the discrete events of reversible binding and unbinding of 
nucleotides to the DNA, and on the covalent linkage of nucleotides into the nascent RNA 
chain. We apply this model to specific DNA sequences that have been used in actual 
experiments |2j. 

During elongation, the double stranded DNA is locally melted by the RNA polymerase 
over a distance of approximately 14-17 basepairs. This locally melted region is known 
as the transcription bubble. Within the transcription bubble, one strand of the DNA acts 
as a template, upon which complementary ribonucleotide triphosphates (ATP, GTP, CTP, 
and UTP) can reversibly bind and unbind to/from the DNA template strand. It has been 
hypothesized, however, that only a part of the transcription bubble is actually used for 
transcription. The size of this window of activity within the transcription bubble formed by 
the RNA polymerase is an integer parameter of our model. The binding of ribonucleotides 
within the window of activity is assumed to be reversible. 

An irreversible reaction, however, is the incorporation of a ribonucleotide into the nascent 
RNA chain. This can occur only when that ribonucleotide is reversibly bound at the first site 
of the window of activity, i.e., the site at the 3' end of the nascent RNA chain. When such in- 
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FIG. 1: (a) The figure shows the look-ahead window of RNA polymerase. Since the first site (left 
end of box, indicated by tic mark) is unoccupied the polymerase cannot move forward. Possible 
events are the unbinding of C, G, or U, or the binding of any ribonucleosidetriphosphate to any 
of the 5 unoccupied sites, (b) Same as (a) except that the first site within the lookahead window 
is also occupied. Possible events include, as in (a), the unbinding of any of the reversibly bound 
ribonucleotidetriphosphates or the binding of any ribonucleotidetriphosphates (including incorrect 
Watson-Crick basepairing) to any of the un-occupied sites. In this case, however, there is an 
additional possible event because the first site is occupied, namely, the forward motion of RNA 
polymerase, as depicted by the arrow in the figure. Note, in particular, that after this motion the 
new first site in the window may again be occupied (as shown) leading to the possibility of another 
forward step as a subsequent event. 

corporation of a ribonucleotide into the nascent RNA chain occurs, we assume that the RNA 
polymerase (and hence the transcription bubble and the window of activity) translocates 
forward one basepair. Because the window of activity has a size of more than one basepair, 
it is quite likely that when the polymerase molecule, and hence the window, moves forward, 
it will already find the correct nucleotide bound at what has just become the site where that 
nucleotide can be incorporated into the growing RNA chain. This is the 'lookahead' feature 
of the model, a kind of parallel processing: placement of the correct ribonucleotidetriphos- 
phate at each site on the template strand of the DNA can occur before that site has been 
reached by the nascent RNA molecule. 

The model is completely specified, then, by the following parameters: w = length (in 
basepairs) of the lookahead window, (K on )ij = rate constant for reversible binding of ri- 
bonucleotide of type i (ATP, CTP, GTP, or UTP) to deoxyribonucleotide of type j (A, C, 
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G, T) in the template strand within the window of activity; (iT ff)tj — r& te constant for 
unbinding of reversibly bound ribonucleotide of type i from deoxyribonucleotide of type j; 
(i^forward)y = ra te constant for covalent incorporation of ribonucleotide of type i into the 
nascent RNA chain, provided that there is a ribonucleotide of type i reversibly bound to 
a deoxyribonucelotide of type j at the first site or the window of activity. Note that we 
consider not only correct Watson-Crick basepairings, but also the possibility of errors. The 
parameter (K on )ij is of course, much larger, and (i^ ff)ij much smaller when, (i,j) is a correct 
Watson-Crick basepair than otherwise. This mechanism protects against errors in transcrip- 
tion. Further error protection could be obtained by making (-Kforward)^' larger when (i,j) is 
a correct Watson-Crick basepair then when it is not. In our simulations, however, we have 
assumed that -forward is constant, independent of (i,j). 

We model the movement of RNA polymerase along DNA using the Gillespie algorithm^] . 
For every possible transition a suitable rate constant is assigned: for each unoccupied site 
within the window of activity, there are 4 binding rate constants, one for each of the ri- 
bonucleotidetriphosphates that can possibly occupy that site; if a site is occupied within the 
window of activity, then there is a rate constant for the ribonucleotidetriphosphates on that 
site to come off; if the first site within the lookahead window is occupied, then there is a 
rate constant for the RNA polymerase to translocate forward one basepair. 

The Gillespie algorithm jumps from event to event. Let t n be the time of the nth 
event. Immediately after the nth event, let the system be in a state such that m n different 
transitions are possible (where the superscript n is just a label, not a power), and let the 
rate constants for those transitions be fc™ . . . fc^. Each of the k's is selected from one of the 
(Kf OTward )ij, (K on )ij, or (K fi)ij as appropriate. Note that m = 4u + (w-u) + b, where w is 
the window size, u is the number of unoccupied sites, and b = 1 if the first site is occupied 
and b=0 otherwise. Then random time intervals T™ . . . are chosen according to 



T n = y\2± j = l... m 1 

where the Kj are independent random numbers uniformly distributed on (0,1]. Then 
the time of the next event is chosen as 
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TABLE I: Values of the rates ((^forward)*;/, (K on )ij, and (-Kofr)i?) and window size used in the 'look- 
ahead' simulations, i and j refer to ribonucleotidetriphosphate (ATP, CTP, GTP or UTP) and 
deoxyribonucleotide (A, C, G, or T) respectively. These parameter values were chosen arbitrarily 
to explore the model's behavior. 



t n+1 = t n + T n , where T n = minT™ (2) 

j J 

and the index J n of the transition that occurs is chosen as the value j that achieves the 
minimum. 

We applied our model to a specific DNA sequence that was used in actual experiments 
Q|. For elongation velocity analysis, we imitated the algorithms used in the experimental 
procedure found in [2] to analyze our data and thereby obtained velocity profiles as well as 
histograms for the distribution of the velocities. Results of our elongation simulations are 
shown in Fig. |2 while the analysis of transcriptional velocity is shown in FigJ3J 

Because our model involves chemical kinetics only, and does not commit to any detailed 



physical mechanism, it is consistent either with powerstroke models such as |20] and |ll| or 





Brownian ratchet models such as [8j. One definite assumption, however is that the poly- 
merase motion is unidirectional. We argue that backwards translocation is uncommon for 
several reasons: (1) the breaking of a covalent bond of the nascent RNA chain is energeti- 
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FIG. 2: Results of simulation along the DNA tether used in [2J. (a) Here we have 10 elongation 
profiles of RNA polymerase traversing along the DNA tether. Experimentally, the RNA polymerase 
attaches randomly to a location on the DNA strand and then begins the process of transcription; 
our model reflects this fact, (b) Closeup of one of the profiles in (a). Note the pause duration of 
about 1 to 2 seconds, followed by an activity of rapid translocation. This pause could be indicative 
that the RNA polymerase is waiting for the first site in the window of activity to be occupied. In 
our model, it is possible that certain DNA sequences are more amenable to pausing than other 
sequences. 



cally unfavorable; (2) at certain sites, the folding of the nascent RNA chain into a hairpin 
provides a 'backstop' that prevents the nascent RNA chain from moving backwards (3) back- 
wards translocation occurs only under special circumstances, namely during transcriptional 
arrest/termination or a complete absence of NTPs [l3 |. 

The nature of pauses in the motion of RNA polymerase has been much debated. Paus- 
ing is important to understand because it can enables synchronization to the enzymatic 
events of translation and regulates the overall speed of transcription. Recent single molecule 
experiments on transcriptional elongation [3| Q| |^| Q have all reached different results 
and conclusions concerning the nature of pausing. Forde et al |9| has hypothesized that 
elongation is a bipartite mechanism, in which the RNA polymerase backtracks followed by 
a conformational change of the polymerase complex, which results in an arrested molecule 
incapable of being rescued by an assisted mechanical force. Bai et al [^] and Shundrovsky 
et al 0] have hypothesized that pausing is the result of backwards translocations along 
the DNA. Neuman et al and Shaevitz et al have hypothesized that a structural 
rearrangement within the RNA polymerase enzyme is the cause of short pausing. Based on 
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FIG. 3: (a) Simulation of an elongation profile of a typical RNA polymerase along a specific 
DNA tether used in |3] (b) Linear least squares Gaussian fit of the elongation profile in (a) using 
the algorithm found in [2| to obtain a velocity profile (c) Normalized distribution of the velocity 
for single RNA polymerase elongation profile of (b) (d) Combined normalized distribution for 30 
RNA polymerase runs along the DNA tether. The interested reader should compare this figure 
with figure 2 (' Analysis of Elongation Velocity') found in |2J. Note: all simulations are with an 8 
basepair window of activity 

the latter experiments, the majority of pausing has been shown to be short and ubiquitous, 
and is not the result of backtracking along the DNA; on the other hand, longer pauses are 
hypothesized to occur by an entirely different mechanism. 

In our model, the statistics of the motion of RNA polymerase may be described as follows. 
Consider the limit in which the forward rate constant is very fast. Then RNA polymerase 
moves forward every time that the first site within the lookahead window becomes occupied. 
The distribution of the waiting time for this to occur will be exponential with a rate constant 
that may be sequence-dependent. Once a a forward step does occur, it may be immediately 
followed by one or several additional forward steps, depending on how many adjacent sites 
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within the lookahead window happened to be filled at the moment when the first site is 
filled. Put another way, the RNA polymerase 'slides' the length of the adjacently occupied 
nucleotides. Such sliding is consistent with the inchworm model p. An interesting property 
of the lookahead model that we have not yet fully explored is the potential role of the 
lookahead feature in preventing transcription errors. Assuming that there is a nonzero 
probability of incorporating an incorrect ribonucleotide covalently into the nascent RNA 
chain, it becomes important to reduce the probability of such an incorrect base being present 
at the site where it would be incorporated. This may be accomplished by having a high 
off-rate for incorrect basepairings, and by allowing sufficient time for this off rate to be 
effective. The lookahead model provides this possibility (in contrast to a model which only 
involves binding followed by a covalent linkage). 

We have presented a chemical kinetic model for RNA polymerase translocation using 
the same DNA sequence that was utilized in actual experiments. The model can be seen 
as formal since we focus on discrete kinetic events, while ignoring more 'continuous' effects 
such as elasticity. The assumption of forward translocation and the nature of p ausing in our 
model are consistent with the results found in Q and Q; the work of [Uj supports the 
biological basis of our model. Finally, and most importantly, the output of the model can be 
processed, by replicating the data analysis algorithms used in the experimental procedure, 
to produce velocity histograms allowing for direct comparison with experimental results. 

Future work includes: 1) fitting of our simulation results to experimental data to obtain 
a set of best parameters, which can then be used to test the validity of the model with 
data from future experiments 2) characterizing temporal correlations and statistics of jump 
size 3) studying error- correcting mechanisms 4) incorporating nearest neighbor effects in the 
unbinding and binding of ribonucleotidetriphosphates. In conclusion, we are only at the 
beginning of exploring this model, and we hope to address these aforementioned issues in a 
future publication. 
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